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AMENDMENTS TO THE CLAIMS 

1. (Currently Amended) A linguistic segmentation tool comprising: 

a lexical feature extraction component configured to receive text and generate lexical feature 
vectors relating to the text, the lexical feature vectors including words from the text and syntactic 
classes of the words , the lexical feature extraction component assigning syntactic classes from a set 
of classes including classes for particular word affixes generated automatically from a corpus of text 
documents in a given language ; 

an acoustic feature extraction component configured to receive an audio version of the text 
and generate acoustic feature vectors relating to the audio version of the text; and 

a statistical framework component configured to generate linguistic features associated with 
the text based on the acoustic feature vectors and the lexical feature vectors. 

2. (Original) The linguistic segmentation tool of claim 1, wherein the linguistic 
features include periods, quotation marks, exclamation marks, commas, and phrasal boundaries. 

3. (Original) The linguistic segmentation tool of claim 1, further comprising: 

a transcription component configured to generate the text based on the audio version of the 

text. 

4. (Currently Amended) The linguistic segmentation tool of claim 1, wherein the 
statistical framework component includes: 
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an acoustic model configured to estimate a probability of an occurrence of the linguistic 
features based on the acoustic feature vectors. 

5. (Currently Amended) The linguistic segmentation tool of claim 4, wherein the 
statistical framework component includes: 

a language model configured to estimate a probability that one of the lexical feature vectors 
corresponds to a text boundary. 

6. (Currently Amended) The linguistic segmentation tool of claim 5, wherein the 
statistical framework component includes: 

a maximum likelihood estimator configured to generate the linguistic features based on the 
probabilities generated by the acoustic model and the language model. 

7. (Original) The linguistic segmentation tool of claim 1, wherein the lexical feature 
vectors additionally include an identification of a structured speech member of the word. 

8. (Original) The linguistic segmentation tool of claim 1, wherein the acoustic feature 
vectors are based on prosodic features including at least one of pause, rate, energy, and pitch. 

9. (Original) The linguistic segmentation tool of claim 1, wherein the syntactic classes 
are indicative of a role of the word in the text. 
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10. (Cancelled) 

11. (Currently Amended) The linguistic segmentation tool of claim [[10]] 9, wherein the 
set of classes additionally include syntactic class e s include syntactic classes defined based on 
frequently occurring words. 

12. (Currently Amended) A method for determining linguistic information for words 
corresponding to a transcribed version of an audio input stream including speech, the method 
comprising: 

generating lexical features for the words, including a syntactic class associated with at least 
one of the words , the syntactic class being selected from a set of classes including classes for 
particular word affixes generated automatically from a corpus of text documents in a given 
language ; 

generating acoustic features for the audio input stream, the acoustic features being based on 
at least one of speaker pauses, speaker rate, speaker energy, [[and]] or speaker pitch; and 

generating the linguistic information based on the lexical features and the acoustic features. 

13. (Original) The method of claim 12, further comprising: 

automatically transcribing the audio input stream to generate the words corresponding to the 
transcribed version of the speech. 

14. (Original) The method of claim 12, further comprising: 
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creating a language model configured to estimate a probability that the lexical features 
correspond to a word boundary based on the lexical features. 

15. (Original) The method of claim 14, further comprising: 

creating an acoustic model configured to estimate a probability of an occurrence of the 
linguistic information based on the acoustic features. 

16. (Original) The method of claim 15, wherein generating the linguistic information 
based on the lexical features and the acoustic features includes using a maximum likelihood 
estimator configured to estimate a final probability of an occurrence of the linguistic information 
based on the probabilities generated by the acoustic model and the language model. 

17. (Original) The method of claim 12, wherein the syntactic class is indicative of the 
role of the at least one of the words. 

18. (Cancelled) 

19. (Currently Amended) The method of claim 12, wherein the syntactic class is further 
selected from a set of classes including classes defined based on word frequency. 

20. (Original) The method of claim 12, wherein the linguistic information includes 
periods, quotation marks, exclamation marks, commas, and phrasal boundaries. 
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21 . (Currently Amended) A computing device for determining linguistic information for 
words corresponding to a transcribed version of an audio input stream that includes speech, the 
computing device comprising: 

a processor; and 

a computer memory coupled to the processor and containing programming instructions that 
when executed by the processor cause the processor to: 

generate lexical features for the words, including a syntactic class associated with at 
least one of the words, the syntactic class being selected from a set of classes including classes for 
particular word affixes generated automatically from a corpus of text documents in a given 
language , 

generate acoustic features for the audio input stream, the acoustic features being 
based on at least one of speaker pauses, speaker rate, speaker energy, [[and]] or speaker pitch, 
generate the linguistic information based on the lexical features and the acoustic 

features, and 

output the generated linguistic information as meta-information embedded in the 
transcribed version of the audio input stream. 

22. (Original) The computing device of claim 21, wherein the syntactic class is 
indicative of the role of the at least one of the words. 

23. (Cancelled) 
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24. (Currently Amended) The computing device of claim 21, wherein the syntactic class 
is further selected from a set of classes including classes defined based on word frequency. 

25. (Currently Amended) A method for associating meta-information with a document 
transcribed from speech, the method comprising: 

building a language model based on lexical feature vectors extracted from the document, the 
lexical feature vectors including words and syntactic classifications of the words, the syntactic 
classifications being selected from a set of classes including classes for particular word affixes 
generated automatically from a corpus of text documents in a given language ; 

building an acoustic model based on acoustic feature vectors extracted from the speech; and 
combining outputs of the language model and the acoustic model in a statistical framework 
that estimates a probability for associating the meta-information with the document. 

26. (Original) The method of claim 25, wherein the meta-information relates to 
linguistic features of the document. 

27. (Original) The method of claim 26, wherein the linguistic features include periods, 
quotation marks, exclamation marks, commas, and phrasal boundaries. 

28. (Original) The method of claim 25, wherein the acoustic feature vectors are based on 
prosodic features including pause, rate, energy, and pitch. 
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29. (Original) The method of claim 25, wherein the syntactic class is indicative of the 
role of the at least one of the words. 

30. (Cancelled) 

3 1 . (Currently Amended) The method of claim 25, wherein the syntactic class is further 
selected from a set of classes including classes defined based on word frequency. 

32. (Currently Amended) A device comprising: 

means for building a language model based on lexical feature vectors extracted from a 
document transcribed from human speech, the lexical feature vectors including a word and a 
syntactic classification of the word , the syntactic classification being assigned from a set of classes 
including classes for particular word affixes generated automatically from a corpus of text 
documents in a given language ; 

means for building an acoustic model based on acoustic feature vectors extracted from the 
speech; and 

means for combining outputs of the language model and the acoustic model to estimate a 
probability for associating a linguistic feature with the document. 
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33. (Currently Amended) A computer-readable medium containing program instructions 
for execution by a processor, the program instructions, when executed by the processor, cause the 
processor to perform a method comprising: 

generating lexical features for words corresponding to a transcribed version of speech, the 
lexical features including a syntactic class associated with at least one of the words , the syntactic 
class of a word being selected from a set of classes including classes for particular word affixes 
generated automatically from a corpus of text documents in a given language ; 

generating acoustic features for the speech, the acoustic features based on at least one of 
speaker pauses, speaker rate, speaker energy, [[and]] or speaker pitch; and 

generating linguistic information for the words based on the lexical features and the acoustic 
features. 
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